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We provide a nonasymptotic analysis of convergence to stationar- 
ity for a collection of Markov chains on multivariate state spaces, from 
arbitrary starting points, thereby generalizing results in [Khare and 
Zhou Ann. Appl. Probab. 19 (2009) 737-777]. Our examples include 
the multi-allele Moran model in population genetics and its vari- 
ants in community ecology, a generalized Ehrenfest urn model and 
variants of the Polya urn model. It is shown that all these Markov 
chains are stochastically monotone with respect to an appropriate 
partial ordering. Then, using a generalization of the results in [Di- 
aconis, Khare and Saloff-Coste Sankhya 72 (2010) 45-76] and [Wil- 
son Ann. Appl. Probab. 14 (2004) 274-325] (for univariate totally 
ordered spaces) to multivariate partially ordered spaces, we obtain 
explicit nonasymptotic bounds for the distance to stationarity from 
arbitrary starting points. In previous literature, bounds, if any, were 
available only from special starting points. The analysis also works 
for nonreversible Markov chains, and allows us to analyze cases of the 
multi-allele Moran model not considered in [Khare and Zhou Ann. 
Appl. Probab. 19 (2009) 737-777]. 

1. Introduction. The theory of Markov chains plays a prominent role in 
the fields of statistics and applied probability. Markov chains have a wide 
range of applications in numerous areas from particle transport through fi- 
nite state machines to the theory of gene expression. Some important appli- 
cations include modeling scientific phenomena in population genetics, statis- 
tical physics and image processing. Another important use is simulating from 
an intractable probability distribution. It is a well-known fact that, under 
mild conditions discussed in [1], a Markov chain converges to its stationary 
distribution. In the applications mentioned above, often it is useful to know 
exactly how long to run the Markov chain until it reaches sufficiently close 
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to the stationary distribution. Answering this question as accurately as pos- 
sible, is what obtaining a "nonasymptotic convergence analysis" of Markov 
chains is all about. The applied probability community has made signifi- 
cant strides in this area in the past three decades. Despite this progress, 
answering this question still remains a challenging task for various standard 
Markov chains arising in applied probability and statistics. There are vari- 
ous examples where currently available state of the art techniques can give 
upper bounds that are substantially larger than the correct answer, often 
by orders of magnitude. 

In the current paper, we provide a nonasymptotic analysis of convergence 
to stationarity for a collection of Markov chains in population genetics. The 
analysis is based on a generalization of the monotone coupling argument to 
multivariate state spaces. These Markov chains appear as standard mod- 
els in population genetics and ecology and include the multi-allele Moran 
process in population genetics and its variants in community ecology, a gen- 
eralized Ehrenfest urn model and the Polya urn process. These Markov 
chains were analyzed in [9], and the authors provide an exact convergence 
analysis in terms of the "chi-square distance" by using spectral techniques. 
But their analysis is somewhat incomplete because it works only for some 
natural selected starting points. Stochastic monotonicity of a Markov chain, 
along with the knowledge of a monotone eigenfunction (see [3] and [17]), can 
be used to obtain a nonasymptotic convergence analysis from an arbitrary 
starting point. Existing results in [3] and [17] require total ordering of the 
state space, which generally works in the case of univariate state spaces. 
In multivariate state spaces, however, there often exists a natural partial 
ordering. We prove that the Markov chains being considered in this paper 
are stochastically monotone with respect to an appropriate partial ordering; 
see Theorems 3.1, 3.2, 3.3. But stochastic monotonicity of a Markov chain 
with respect to the partial ordering, even with the knowledge of a monotone 
eigenfunction, is not enough to get desired convergence hounds. However, an 
additional condition, satisfied by all the Markov chains under consideration 
in this paper, enables us to obtain useful convergence bounds; see Theo- 
rem 2.1. Another limitation of the spectral techniques used in [9] is that 
they require reversibility of the Markov chain under consideration. The cou- 
pling argument presented in this paper also works for nonreversible Markov 
chains. Using this, for example, we are able to obtain explicit convergence 
bounds for generalizations of the standard multi-allele Moran model which 
are nonreversible. 

Another important issue to understand is that out of the three classes of 
examples considered in this paper, the stationary distribution and the sec- 
ond largest eigenvalue of the Markov chains corresponding to the generalized 
Ehrenfest urn models and the Polya urn models are known {the stationary 
distribution is unknown for the general multi-allele Moran model). Hence, 
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for these two models, from a general starting point x, one could potentially 
consider the crude upper bound — 7= for the total variation distance from 

stationarity after n steps. Here 7r(x) denotes the mass put by the stationary 
distribution at x, and A denotes the second largest eigenvalue. However, the 
upper bounds derived in this paper mostly provide a significant improve- 
ment over the crude upper bound. See the remarks in Section 3.2.1 and 
Section 3.3.1. 

Here is an example of our results. The Unified Neutral Theory of Bio- 
diversity and Biogeography (UNTB) is an important theory proposed by 
ecologist Stephen Hubbell in his monograph [7] which is used in the study 
of diversity and species abundances in ecological communities. There are 
two levels in Hubbell's theory, a metacommunity and a local community. 

We concentrate here on the evolution of the local community. The local 
community has constant population size N with d different species. At each 
step, one individual is randomly chosen to die and is replaced by a new 
individual. With probability m, the new individual is chosen randomly from 
the metacommunity, which has proportion pi of species i {i = 1,2, . . . ,d). 
With probability 1 — m, the new individual is randomly chosen from the 
remaining — 1 individuals in the local community. This process is a variant 
of the so-called multi-allele Moran model in population genetics [5]. The 
metacommunity evolves at a much larger time scale and is assumed to be 
fixed during the evolution of the local community. 

A very important issue of both practical and theoretical interests is to 
determine how soon a local community reaches equilibrium (see McGill [11]). 
Let K{-,-) be the transition density of our local community Markov chain 
with state space X and stationary density vr. Let x S be the initial state of 
the Markov chain. We are interested in answering the following question. For 
arbitrary e > 0, how many steps, n, are needed so that the total variation 
distance between the density of the Markov chain after n steps and the 
stationary density is less than e? More precisely, we want to find n such 
that 

ll^x - vtIItv = IY1 - < e, 

X 

where denotes the density of the chain started at state x after n steps. ^ 
Khare and Zhou [9] provide an exact answer to this question in terms of 
the "chi-square distance" by using spectral techniques, when all individuals 
belong to the same species to begin with. So providing any nonasymptotic 



^For ease of exposition, if / and g are densities with respect to the counting measure 
on a finite state space X, \\f — qWtv will denote the total variation distance between the 
probability measures corresponding to / and g. 
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convergence bounds from an arbitrary starting point was still unresolved. 
Convergence bounds for the general local community Markov chain are pro- 
vided in Section 3.1, with an arbitrary starting point. Note that the upper 
and lower bounds obtained are not exactly matching, but they are within a 
reasonable range of each other. Considering the fact that no useful analy- 
sis was available from an arbitrary starting point, the bounds provided are 
definitely a significant step forward. 

As an illustration, note that under suitable parametrization (see [9]), the 
local community process by Hubbell is the same as the Polya down-up 
model; see Section 3.2. Suppose that the local community has population 
size = 100 with d = 5 species. With probability m = 0.9, the new indi- 
vidual is chosen randomly from the meta-community with uniform species 
frequencies p= (0.2,0.2,0.2,0.2,0.2). Let X = (Ai, . . . , A^) beany (random) 
count vector of the local community, where Xi is the count of individuals 
of species i. From Section 3.2, for a starting state x = (0, 10,0, 10,80), the 
bounds on the total variation distance are obtained as 



For e = 0.01, (1.1) tells us that at least 401 steps are necessary and at most 
1018 steps are sufficient for the total variation distance to be less then 0.01. 
The crude upper bound for total variation distance is (2.2186 x 10^^) (1 — 
1/111)" which gives 5432 steps are sufficient for the total variation distance 
to be less then 0.01. 

The paper is organized in the following way. In Section 2, we provide 
the necessary background for stochastic monotonicity, and then proceed to 
prove Theorem 2.1, which generalizes the results in [3] and [17] to multivari- 
ate partially ordered finite state spaces to obtain convergence bounds, under 
appropriate monotonicity assumptions. In Section 3, three classes of Markov 
chains: multi-allele Moran model, generalized Ehrenfest urn model and gen- 
eralized Polya urn model are considered. Each of these Markov chains is 
shown to be stochastically monotone with respect to an appropriate partial 
ordering, and also shown to satisfy the other assumptions in Theorem 2.1. 
All these are combined to provide nonasymptotic convergence bounds for 
these classes of Markov chains from arbitrary starting points. We conclude 
the paper with a short discussion in Section 4. 

2. Monotone Markov chains. 

2.1. Background. Let X he a finite state space with total ordering <. 
Let K{-,-) be a Markov kernel on X. We say K is stochastically monotone 
if for all X € X and x' £ X with x < x' , 



(1.1) 




Kix, y)>Y, K[x\ y) for all y' G X. 



y<y' y<y' 
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Monotone Markov chains have been thoroughly studied and apphed. See 
Lund and Tweedie [10], Stoyan [15] and the references therein. They are 
currently popular because of "coupling from the past." See David Wilson's 
website on perfect sampling, http : //research. microsoft . com/en-us/um/ 
people/dbwilson/exact, for extensive references on this subject. 

Alternatively, if the state space of a Markov chain is totally ordered 
(e.g., a subset of Z and M), then the Markov chain with corresponding tran- 
sition operator K is stochastically monotone if for every monotone function 
/ : A' — )• R, the function Kf is also monotone. There is a standard coupling 
technique available for monotone Markov chains on totally ordered spaces. 
Wilson [17] uses this coupling technique in the presence of an explicit eigen- 
function to provide general convergence bounds for stochastically monotone 
Markov chains on totally ordered finite state spaces. Diaconis, Khare and 
Saloff-Coste [3] provide extensions for general state spaces and use these 
results to analyze certain two-component Gibbs samplers. 

However, for multivariate state spaces, there is often no natural total 
ordering, but there exists a natural partial ordering. For example, if X con- 
sists of (i-dimensional vectors, then entry- wise domination gives rise to a 
standard partial ordering. A Markov chain with corresponding transition 
operator K is monotone with respect to a partial ordering, if whenever 
/ : A' — 7- M is monotone with respect to the partial ordering, Kf is mono- 
tone with respect to the partial ordering. See Fill and Machida [6], Beskos 
and Roberts [2], Roberts and Rosenthal [14] and the references therein for 
varied applications. The literature on perfect sampling mainly consists of 
various techniques for simulating from specific distributions on partially or- 
dered spaces with a unique minimal and maximal element; see Propp and 
Wilson [13]. Note that, unlike perfect sampling, our focus is to analyze given 
Markov chains corresponding to specific models, and not to devise Markov 
chains to simulate from a specified distribution. 

The theorem listed below generalizes earlier results in Wilson [17] and 
Diaconis, Khare and Saloff-Coste [3] (for univariate totally ordered spaces) 
to multivariate partially ordered spaces in order to obtain nonasymptotic 
convergence results. 

2.2. Convergence of monotone Markov chains: General result. 

Theorem 2.1. Let K he the transition density of a Markov chain on 
a finite state space X equipped with a partial ordering, <. Suppose that K 
has a stationary distribution with density tt, and the following conditions 
are satisfied: 

(a) K is monotone with respect to the partial ordering, ^. 

(b) (Pair-wise dominance property) For an arbitrary x and y in X , there 
exists z(x, y) (depends possibly on x and y) such that z either dominates x 
and y or is dominated by both x and y with respect to ^ . 
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(c) A G (0, 1) is an eigenvalue of K with strictly monotone eigenfunction 
f such that 

ci= inf {/(y*)-/(x*)|x*,y*GA'}>0, C2 = sup |/(x)| > 0. 
Then for any starting state x, 

— |/(x)| < - ^IItv < -E|/(Y) + /(x) - 2/(z(x, Y))|, 

2C2 Ci 

where Y ~ tt. 

Proof. Let x* e X and y* & X satisfy x* ^ y*. It is well known that 
if a probability distribution /i on ^ is stochastically dominated by another 
probability distribution i/ on Af, that is, J f d^< J f dv for every monotone 
function /, then we can construct random variables X and Y such that 
X ~ /X, Y ~ and X ^ Y; see for example [8]. Since K is monotone with 
respect to the partial ordering, by repeated application of this result, we 
can construct two coupled Markov chains, {X„}„>o and {Y„}„>o such that 
Xq = X* , Yq = y* and X„ ^ Y„ for every n > 1. Further, if X„q = Y.„(j , then 
X„ = Y„ for all n > ng . 

It follows that for any n > 1 , 

\\Kl, - K^, IItv < P(X„ / Y„|Xo = X*, Yo = y*) 
7(Y„)-/(X„) 



<E 



ci 



Xn = x*,Yn 



The previous inequality uses X„ ^ Y„, the strict monotonicity of / and the 
hypothesis that /(y) - /(x) > ci if x ^ y, x / y. 

Next, since / is an eigenfunction of K, it follows that 

E{/(Yfc) - /(Xfe)|Xfe_i, Yfe„i} = A{/(Yfc_i) - /(Xfc„i)}, 

for every A; > 1. Therefore, 

■ /(Y„) - /(X„) 



E 



ci 



Xji_i, Yj-i_ 



Xn = x*,Yn 



= -E{/(Y„„i) - /(X„„i)|Xo = X*, Yo = y*} 
ci 

= -{/(y*)-/(x*)}. 

Cl 

Note that the argument above holds for any x* ^ y* . 

Note that for any x 7^ y, by the pair-wise dominance assumption, there 
exists z(x,y) (depends possibly on x and y) such that z dominates both x 
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and y or is dominated by both x and y. Hence, 

\\K - i^y IItV < \\K - KWtY + \\K^ - K^TV 

<-|/(x)-/(z)|+-|/(y)-/(z)| 

Cl Cl 

= -|/(x) + /(y)-2/(z)|. 

Cl 

The previous equahty follows from the fact that z either dominates or is 
dominated by both x and y, and / is monotone with respect to ^, which 
implies that /(x) — /(z) and /(y) — /(z) are either both positive or both 
negative. Convexity now yields 

\n 

\\K - ^IItv < Yl ^(y)\\K - ^;IItv < — E.|/(x) + /(Y) - 2/(z(x, Y))|. 

^ Cl 

To get the lower bound, note that 

1 A" 
\\K - Atv > —\EK^{f{Y)) - E.(/(Y))| > ^|/(x)|. 

Hence the theorem is proved. □ 

Remark. (1) It is to be noted that Theorem 2.1 works for any arbi- 
trary starting point without requiring the assumption of reversibility. In 
Section 3.1, we show that the bounds on the total variation distance can be 
obtained without explicit knowledge of the stationary distribution. 

(2) In all our examples, there will be a unique minimal element (and no 
maximal element), which is clearly sufficient to satisfy the pair-wise domi- 
nance condition. 



We now apply this general result for a variety of Markov chains in popu- 
lation genetics. 



3. Applications. 



3.1. The Moran process in population genetics. The classical Moran pro- 
cess in population genetics models the evolution of a population of con- 
stant size by random replacement followed by mutation. Suppose there are 
d species in a population of size N. At each step, one individual is chosen uni- 
formly to die and independently another is chosen uniformly to reproduce. 
They may be the same individual. If the latter is of species i, the offspring has 
probability niij, 1 < j < d, to mutate to type j. Let X„ = (X„i, . . . , Xnd) be 
the vector of counts of species 1, 2, . . . , d at the nth step. Let No := N U {0}. 
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Then {X„}„>o forms a Markov chain on X^, where 



^^=|x=(xi,...,Xd)e<:J^Xi = A^|. 



Let K denote the transition density of this Markov chain. Note that the 

•d I _ (N+d- • 

n\ — \ n 



size of the state space is \ = f^"*^ ^). The one-step transition probabil- 



ities are 



l<i^J<d; 



(3.1) i^(x,x) = l-^J^(x,x + ei-ej); 

K{x,y) = otherwise, 

where ej is the unit vector with ith entry equal to 1 . The mutation matrix M 
is assumed to be irreducible. This ensures the irreducibihty and aperiodicity 
of the transition function K; see proof in the Appendix. Hence, the sta- 
tionary distribution of K exists. Let vr denote the density of the stationary 
distribution with respect to the counting measure. 

This model {d = 2) is due to Moran [12]. Background and references can 
be found in the text by Ewens [5]. When d = 2, in the continuous-time 
setting, Donnelly and Rodrigues [4] obtain an upper bound in terms of the 
separation and total variation distances, when all the individuals belong 
to the same generation initially. Watkins [16] analyzes the infinite allele 
Moran model in the discrete-time setting. However, unlike the multi-allele 
case, the (infinite) vector of species counts does not form a Markov chain. 
Instead, the A^-dimensional vector whose ith. entry is the number of species 
with i individuals at the current stage, forms a Markov chain. It is this 
fundamentally different Markov chain that is analyzed in Watkins [16] using 
strong stationary times. 

In the multi-allele case, which we analyze, a standard choice of the mu- 
tation matrix M = {?T2-jj}i<jj<rf is 

(3.2) M = {l-m)I + mP, 

where < m < 1 is the mutation probability of the offspring, and P is a 
stochastic matrix with each row {pi, ■ ■ ■ ,Pd), a probability vector with posi- 
tive entries. If mutation happens, the offspring will change to species i with 
probability pi . It is known from the literature that for this standard choice of 
the mutation matrix M, the corresponding Markov chain is reversible. Khare 
and Zhou [9] analyze this Markov chain and provide nonasymptotic conver- 
gence bounds in terms of the "chi-square distance" for some natural selected 
starting points. In this paper, we generalize this analysis in two directions. 
First, instead of considering the choice M = (1 — m)I + mP, we consider a 
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general subclass of mutation matrices described in (3.3)-(3.5) which includes 
this choice as a special case. Second, we provide nonasymptotic convergence 
bounds from an arbitrary starting point. Consider the class of mutation ma- 
trices M satisfying one of the monotonicity conditions specified below: 

(3.3) rridj < min m^j for every 1 < j < d — 1 
or, 

(3.4) nidj < min m^,- for every 1 < j < d — 1 

l<k<d—l 

and M* = {^ij}i<i,j<d-i is irreducible, where m*j = ruij — m^j or, 

(3.5) rridj < min mkj for every 1 < j < d — 1 

and M* has an eigenvector which has all strictly positive entries. 

Each of these conditions essentially says that there is a species, which we 
call species d without loss of generality, such that the mutation probability 
from this species to any species is smaller than the mutation probability 
from every other species to this species. 

It is to be noted that for a general M satisfying any one of these three 
conditions, the Markov kernel K is nonreversible, and in this case, often 
the stationary distribution of K is not known. Note that condition (3.5) 
is satisfied by the standard choice of M = (1 — m)I + mP, and hence the 
analysis of this standard choice will come out as a special case. An example 
where conditions (3.3) and (3.4) are satisfied would be the following: Suppose 
nidi = ^ ^-nd m^d = 1 — (J) that is, the offspring born to species d can possibly 
mutate only to species 1 with a small probability 5. Suppose mid > 0, that 
is, species 1 can also mutate to species d with a positive probability. If all the 
mutation probabilities among species 1,2, . . . ,d — 1 are larger than 6, that 
is, niij > 5 for 1 < i, j < d — 1, then conditions (3.3) and (3.4) are satisfied. 

Let us introduce a partial ordering on X^. We define x,y G to be par- 
tially ordered, that is, x ^ y if Xj < y^, i = 1, 2, . . . , d — 1. This automatically 
implies Xd > yd- To get bounds on the total variation distance, according to 
Theorem 2.1, we need an eigenfunction / which is strictly monotone in ^, 
that is, if X, y E with x ^ y, then /(x) < /(y). 

Proposition 3.1. Let K denote the Moran process specified by (3.1), 
and suppose the mutation matrix M satisfies any one of conditions (3.3)- 
(3.5). Then K has a linear and strictly monotone eigenfunction f . 



Proof. Note that 



l<i^j<d 
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= x+ Yl (^^-^^•)77fe77"^fc*) 

i<i¥'j<d \k=l / 

= x+ ^ (e.-e.O^/'j^^mfc, 

l<i,j<d \k=l 

/ d \ / d 

= x+ ^ e, - 2^ 

!<«<£( \fc=l / l<i,j<d \k=l 

Let a= (aj)i<i<(i be any eigenvector corresponding to an eigenvalue A of 
M. Then we have 

E,,,,,[a-Xl = { (l - ^) a- + iF(Ma)-}. = { (l " ^) + ^A}a-.. 

Hence, /(x) = X^j^^ is an eigenfunction of K corresponding to the eigen- 
value (1-^) + ^. 

We now show that M has an eigenvector a such that Oj > ad for every 
l<i<d— l.It follows from condition (3.3) that m*j > 0, and from condition 
(3.4) that m*j > 0, and M* is irreducible. Hence, under condition (3.3) or 

(3.4) , by the Perron-Frobenius theorem, the largest eigenvalue A* of M* is 
positive with multiplicity 1, and there exists an eigenvector a* = {a*)i<j<d~i 
corresponding to A*, such that a* has all positive entries. Also, in condition 

(3.5) , we have directly assumed a* has all positive entries. Note that 

A* < max > m*, 

i<i<d~i^ "-^ 
i=i 

d-l 

= max y^iniij-mdj) 

KiKd-l'^ 

= max {rndd-niid) 
i<i<d-i 

< mdd 

<1, 

since the mutation matrix M is assumed to be irreducible. 
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Let c be defined by 

Ed— 1 * 

A* - 1 ' 

and a be defined by 

_(a*+c, ifl<f<d-l, 
* \ c, if i = cZ. 

Note that, by the definition of c, 
d d-1 

(3.6) 'y^^mdjaj = rndja* + c = (A* — l)c + c = A*c. 
We have 

M*a* = A*a* =^ '^{niij - mdj){aj - c) = X* {ai - c) yi<i<d-l. 

Note that Yl'j=ii''^ij ~ ''^dj) = i^dd — 'nriid and = c. It fohows that 

(3.7) y^(m^j - mrfj)aj = \*{ai - c). 
i=i 

Adding (3.6) and (3.7), we get Ma = A*a. This shows a is an eigenvector of 
M corresponding to eigenvalue A* . 

Thus, /(x) = Yl'l^i CLiXi = YliZi (oi — ad)xi + Nad, which is strictly mono- 
tone with respect to ^, is an eigenfunction of K corresponding to the eigen- 
value A = (1 - i) + ^. Since A < 1, it follows that E^[/(X)] =0. □ 

We now show that for the Moran process, K is monotone with respect to 
the partial ordering, ^. 

Theorem 3.1. Let K denote the Moran process specified by (3.1), where 
the mutation matrix M satisfies one of the conditions specified in (3.3)- 
(3.5). Then K is monotone with respect to the partial ordering, <. 

Proof. Consider any x € Xf^ and y G Xf^ with x ^ y. We construct 
two random vectors X and Y such that X ^ Y with X ~ iir(x, •) and Y ~ 
K(y, •). This will immediately imply that Kf{x) < Kf(y) for any monotone 
function / and any x, y with x ^ y . 

Let x = {xi,X2, . . . ,Xd) and y = {yi,y2, . . . ,yd)- Then by assumption Xi < 
yi for every 1 <i < d — 1. We now describe the procedure for obtaining X 
and Y. 
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Table 1 

Labeling of individuals of population 1 and population 2 



Species 


1st 


2nd 


3rd 


4th 


X 


/ 


mil 


/////// 


//// 


labels 


1 


2,3,4,5,6 


7,8,9,10,11,12,13 


14,15,16,17 


y 


// 


///// 


//////// 


// 


labels 


1,16 


2,3,4,5,6 


7,8,9,10,11,12,13,17 


14,15 



In order to specify the coupling argument, consider two populations with 
A'" individuals each. Population 1 has Xi individuals of species i, and popu- 
lation 2 has Di individuals of species for every 1 < i < d. We label the indi- 
viduals in the two populations as follows. The individuals of the ith species 
of population 1 are labeled from (X]j=i a^j-i + 1) to X]}=i Xj,i = 1,2, ... ,d, 
taking xq = 0. The labeling of the individuals of population 2 is done in the 
following way: 

• Note that Xi < yi for i = 1,2, . . . ,d — 1. For the ith species of population 
2, where i = 1,2, . . . ,d — 1, we give Xi of the individuals the exact same 
labels as those in species i of population 1. This leaves yi — xi "extra 
individuals" to be labeled later. 

• Note that X(i>yd- For the dth species of population 2, the yd individuals 
of the dth species get exactly same labels as the first y^ individuals of the 
dth species of population 1. 

• Finally, all the Xd — yd "extra individuals" left over in the first d—1 species 
of population 2 get the Xd — yd labels in the dth species of population 1 
which were not assigned in the previous step. 

The following example illustrates the labeling technique of the A'^ in- 
dividuals in population 1 and population 2. Consider = 17 individuals 
who belong to d = 4 different species type. Also consider x = {1, 5, 7, 4} and 
y = {2, 5, 8, 2}. The table below illustrates the labeling technique. 

In Table 1, we label the individuals of population 1 from 1 to 17 based 
on X. For the 1st species of population 2, there are 2 individuals, the first 
individual gets the label 1, same as the label of the first individual of pop- 
ulation 1, and the second individual is an "extra individual," to be labeled 
later. Now, for the 2nd species, there are the same number of individuals for 
both the populations, so these individuals get the same labels. For the 3rd 
species, there is one "extra individual," to be labeled later; other individuals 
get the same labels. The 4th species has 2 individuals in population 2, who 
get the same labels as the first 2 individuals of the 4th species in population 
1. Last, 2 extra labels 16 and 17 are assigned to the "extra individuals" of 
species 1 and 3 of population 2, respectively. 
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Let us return to the general proof, and define ki := X^jZ]^ Xi to be the 
total number of individuals in the first d — 1 species of population 1 and 
^2 '■= Vd to be the number of individuals in species d of population 2. We 
now change the species configuration of population 1 and population 2 in 
four sub-steps which are described below: 

(I) Choose a label uniformly between 1 to N . Call it ii. 
(II) Independently choose another label uniformly between 1 to N . Call 
it i2. 

(III) Let si^ij ™d S2,i2 denote the species of the individual labeled 12 in 
population 1 and population 2, respectively. Add one individual of species 
si^i2 to population 1 and one individual of species 82,13 to population 2. 

Note that if 1 < ^2 < ^1 + ^2, then si^jj = S2,i2 := Si^. In this case the 
newly added individual in both the populations mutates in the following way: 
Generate U ~ Uniform[0, 1]. If < [/ < rug-^i, the added individual mutates 
to species 1. If rug-^i < U < rrig-^i + the added individual mutates 

to species 2, and so on. Finally, if rug^^i + ^Si^2 + • • • + ms.^(^^_i^ <U<1, 
the added individual mutates to species d. Hence, after the mutation, both 
populations have an individual of the same species added, which therefore 
preserves the partial ordering between their species configurations. 

Next, suppose fci + ^2 + 1 < ^2 < ^5 then si^i^ = d and 52,^2 is one of 
the first d — 1 species. Note that .^j > m^j for every j = 1, 2, . . . , (i — 1. 
The newly added individual in population 1 mutates in the following way: 
Generate U ~ Uniform [0, 1]. If < C/ < m^i, the added individual mutates to 
species 1. If m^i <U < m(ii + m(i2, the added individual mutates to species 2, 

and so on. Finally, if mdi+md2-\ <1, the added individual 

mutates to species d. Now, in population 2, the newly added individual mu- 
tates in the following way: Choose the same U as for population 1. If < C/ < 
mdi or mdi + md2^ <U < ms^,,^! + "1,^2 H \-md(d-i), the in- 
dividual mutates to species 1. If m^i <U < mdi+md2 or i^i+md2 H h 

fnd(d-i) <U < ms2^.^i + ms2^i^2 + ind3-\ \-md(d-i), the individual mutates 

to species 2, and so on. Finally, if m^j^^i + 77152,^2 H ^"^S2.i2(<i-i) ^U<1, 

the individual mutates to species d. Hence, when < U < nidi + ^d2 + 

^TTT-d{d^i) or when + m^^ ^^2 H ^ms^^^^(^d-i) <U<l,the newly 

added individual in both the populations mutate to the same species, which 
preserves the partial ordering between their species configurations. Alterna- 
tively, if mdi+md2^ ^md(^d-i) <U < ,^1 + m^2,,2 2 ^'"sa.iaCd-i)' 

then after mutation the newly added individual in the population 1 is in 
species d, but the newly added individual in the population 2 is in any of 
the first d — 1 species. This again preserves the partial ordering between the 
species configurations in population 1 and population 2. 

(IV) Finally, the individual corresponding to the label ii dies for both 
the populations. If 1 < ii < /ci + /c2, then the individual belongs to the same 
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species for both the populations. If fci + A;2 + 1 ^ ^ N, then the individual 
corresponding to the label ii belongs to species d for population 1 and is an 
"extra individual" in the first d — 1 species of population 2. In either case, 
the partial ordering is preserved. 

Let X and Y be the resulting species configurations of population 1 and 
population 2, respectively. Note that marginally the movement from both x 
to X and y to Y follows the transition mechanism of K, and X ^ Y. This 
completes the proof. □ 



3.1.1. Bounds on total variation distance. For the partial ordering, ^, 
discussed above, applying Theorem 2.1 in the case of the Moran model, 
provides us with bounds on the total variation distance. We have shown that 
for the Moran process, K is monotone with respect to the partial ordering, ^; 
see Theorem 3.1. It is easily seen that (with first d— 1 entries equal to zero, 
and the dth entry equal to N) is dominated by x for every x G X^. Hence, 
the pair-wise dominance property is satisfied. Recall that by Proposition 3.1, 
there exists an eigenfunction /(x) = Yli=i '^i^i — ~ o-cdxi + Na^ of 

K corresponding to the eigenvalue A = 1 — + ^ , such that / is strictly 
monotone with respect to the partial ordering, <. Hence, the conditions of 
Theorem 2.1 are satisfied, and the bounds on the total variation distance 
are obtained as 

\n \n 

— |/(x)| <\\Kl- ^IItv < -E.{/(Y) + /(x) - 2/(0)} 

2C2 Ci 

\n \n 

=^ — |/(x)| <\\K^- vtIItv < -{/(x) - 2/(0)} 

2C2 Ci 



2^ 



y^^a*Xi + Nad 



1=1 



■d-i 



< \\K^ - ttIItv < —{Y.a*Xi -Nad\, 



. 4 = 1 



where a* = ai — ad > for every 1 < i < d — 1 (by the monotonicity of /), 
ci = mini<j<rf„i a* >0 and C2 = max{ — A^a^, A^(maxi<j<^_i a* +0^)}. Note 
that ad < 0. Note again that the stationary distribution vr in not known in 
general, but the analysis above leads to upper and lower bounds which do 
not depend on the stationary distribution, and are reasonably close to each 
other. 



3.1.2. Bounds on total variation distance in the special case. We now 
provide a nonasymptotic convergence analysis for the special choice of M = 
(1 — m)I + mP. It has been proved earlier in Khare and Zhou [9] that 
the Markov chain K corresponding to the multi-allele Moran model with 
M = (1 — m)I + mP has second largest eigenvalue A = 1 — ^(}^\ where 



CONVERGENCE ANALYSIS OF MARKOV CHAINS 



15 



\a\ := Yli=i Q^i) ^ = (oi, a2, • • • , a^), where Qj = , with the eigenspace 
given by the space of centered hnear functions of xi, X2, • • • , Xd-i- After sim- 
phfication, we obtain A = 1 — ^ . It is known that the stationary distribution 
in this case is the Dirichlet-multinomial distribution with parameters N 
and ex. The Dirichlet-multinomial distribution with parameters > and 
a = (ai, 02, . • . , ad),cti > 0, has probability mass function given by 



N ) 



N+\a\- 



N- 



Since M = (1 — m)I + ?nP, it follows that M* = (1 — m)Id-i- Hence, any 
{d — l)-dimensional vector with positive entries is an eigenvector of M*. 
Suppose we choose the eigenvector a* of M* such that = 1 for i < d. Then, 
for the Markov chain K, we get the eigenfunction /(x) = Yli=i ~ ~Pd) 
corresponding to the eigenvalue A = 1 — ^ . Note that / is strictly monotone 
with respect to the partial ordering, As in the case of the general multi- 
allele Moran model, here also it is easily seen that is dominated by x for 
every x € X^. We have ci = 1 and C2 = maxjA''^^, A^(l — p^)}. Thus, bounds 
on total variation distance are obtained as 

(1 - m/AT)" '^"^ 



(3.8) ||A'x-^IItv> 



2msix{Npd,N{l-pd)} 

/ \n /d~l 

(3.9) ||K--vr||TV< 1-?^ {yx, + N{l-pa) 



^x,-N{l-pd] 

i=l 




Example 3.1.1. Consider the multi-allele Moran model in the special 
case when the mutation matrix M = (1 — m)I-|-mP. Suppose the population 
size A^ = 100, with d = 5 species and mutation probability m = 0.7. When 
mutation occurs, the individual mutates to the ith species with probability 
Pi = 1/5. Using (3.8) and (3.8), for a starting state x = (0, 10, 0, 10,80), the 
bounds on the total variation distance are obtained as 

(3.10) 0.375^1 < ||A:^-^IItv < lOofl ^ 



lOOOy - " ^ - \^ 1000 

For e = 0.01, (3.10) tells us that 516 steps are necessary and 1312 steps are 
sufficient for the total variation distance to be less then 0.01. The crude 
upper bound for the total variation distance is (2.1665 x 10^^)(1 — j^)"", 
which gives 5683 steps are sufficient for the total variation distance to be 
less then 0.01. 



3.2. Sequential Polya urn models. Choose d urns with A^ balls distributed 
in them. Suppose the inherent weight of urn i is Oj, « = 1, 2, . . . , and let a = 
(oi, 02, . . . , otd) denote the vector of urn weights and \ol\= on denote 
the total inherent weight of d urns. Suppose that each ball has unit weight. 
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(1) Polya level model [9]: Consider the Markov chain whose one-step 
movement consists of the following sub-steps: 

(i) Randomly choose s balls out of N balls and mark them. 

(ii) Draw an urn with probability proportional to its weight (inherent 
weight + weight of balls) and add a ball (of unit weight) to the cho- 
sen urn. Repeat this s times. 

(iii) Remove the s marked balls from the respective urns. 

(2) Polya up-down model [9]: These are variations of Polya level models, 
where the three steps are performed in the following order (ii), (i) (with 
N + s total balls) and (iii). 

(3) Polya down-up model [9]: These are variations of Polya level models, 
where the three steps are performed in the following order (i), (iii) and (ii). 

We first analyze the Markov chain corresponding to the Polya level model. 
Let Xni denote the number of balls in the ith urn at the nth step of the 
Polya level model. Then {X„ = (X„i,X„2, ■ • • ,Xnd),n = 0, 1,2, . . .} forms a 
multivariate Markov chain on X^. Let K denote the transition density of 
this Markov chain. Let ^ be the partial ordering on as in the multi-allele 
Moran model. 

Theorem 3.2. K is monotone with respect to the partial ordering, ^. 

Proof. Consider any x G and y G with x ^ y. We construct 
two random vectors X and Y such that X ^ Y with X ~ K{x, •) and Y ~ 
K{y, •). This will immediately imply that Kf{x.) < Kf{y) for any monotone 
function / and any x, y with x ^ y . 

In order to specify the coupling argument, we consider two populations 
of balls each, with N balls distributed in d urns based on x and y, 
respectively. We use the same labeling technique for both the populations 
as discussed in Theorem 3.1 (regarding species as urns and individuals as 
balls). 

We now change the urn configuration of population 1 and population 2 
in three sub-steps which are described below: 

(I) Choose s labels without replacement from 1 to A^. 
(II) This sub-step will consist of s sequential urn draws, and after each 
draw, an extra ball will be added to the chosen urn for both the populations 
as described below. Repeat the following for j = 1, 2, . . . , s. 

Generate Uj ~ Uniform[0, 1]. Now, at the beginning of the jth draw in this 
sub-step, there are, in total, N + j — 1 balls each in both the populations. 
Hence the total weight of the urns (with balls) in both the populations is 
\a\ + N + j — 1. Let X-'^-'^ := (x{~ ,X2~ , ■ ■ ■ ,2;;^" ) be the configuration of 
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P^(^dli^ ^ Uj < qi ^ + ^ + • • • + P^(^dliy choose urn 2 and so on. Fmahy, 
if + q2~^ H h Q'Id-i) — choose urn d. Add a bah to the chosen 



the balls in the d urns of population 1 at the beginning of the jth draw, 
and Y-'^^ := {y{~^ ,y2~^ , ■ ■ ■ jyd~^) configuration of the balls in the d 

urns of population 2 at the beginning of the jth draw. Let us denote the 
normalized probability vector of the urn weights for population 1 by p-'"^ = 
(Pi~^iP2~^' • • • ^Pd~^)^ ^^^'^ normalized probability vector of urn weights 
for population 2 by q^-^ = . . . , g^"^), where p^'^ = 

Procedure to choose an urn for population 1 at the jth draw: 

If < Uj < p{~^, choose urn 1. If p{~^ < Uj < p{^^ + p{~^, choose urn 2 

and so on. Finally, if p{~^ +pi"^ + • • • +p'(^d\^ ^Uj <1, choose urn d. Add 

a ball to the chosen urn. 

The following is the procedure to choose urn for population 2 at the jth 

draw: 

If < Uj < p'f^ or, p{~^ + pif^ + ■■■+ pi^\^ < Uj < q{~^ + pi~^ + ■■■ + 
p'f^^^y choose urn 1. If < Uj < + 'A^ + Pq^ + • • • + 

1 

-1) 

urn. 

(Ill) Remove the balls corresponding to the s labels in sub-step (I) from 
both the populations. 

It is to be noted that in sub-step (II), assuming X-^'^^ ■< Y-'"-'^ (and hence 
p-'^^ ■< q-^^^), the mechanism for drawing urns is such that either the same 
urn is chosen for both the populations or when the dih. urn is chosen for 
population 1, then any of the first d — 1 urns is chosen for population 2. 
Hence, X-' ■< Y-' (and hence p-' ^ q-' ). Since X'' = x and Y'' = y, it follows 
by induction (on j) that X-' ^ Y-' for j = 1, 2, . . . , s. In sub-step (III), the 
balls with the same s labels are removed from both the populations. Based 
on the labeling procedure, either balls with the same label lie in the same 
urn for both the populations, or the ball lies in the dth urn for population 
1 and is an "extra ball" in the first d — 1 urns for population 2. In either 
case, removing balls with the same label from both the populations does not 
change the partial ordering of the urn configurations. 

Let X and Y be the resulting urn configurations of population 1 and 
population 2, respectively. It follows from the discussion above that X < 
Y. Note that marginally the movement from both x to X and y to Y 
follows the transition mechanism ol K. To see this, note that the probability 
of choosing the ith urn at the jth draw in sub-step (II) for population 1 
is P(X]^=ip] < Uj < Yl\=iPe) — Pi'^ corresponding probability for 

population 2 is P{EVM < Uj < ZUiP^ + P(Efcl 4 + T.tWi < Uj < 
J2\=i 1e + YliZi+iPi) = Qi- This completes the proof. □ 
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We can similarly argue that the Markov chain corresponding to the Polya 
up-down model and the Polya down-up model are stochastically monotone 
with respect to the partial ordering, ^ in X^. 

3.2.1. Bounds on total variation distance. In case of the Polya level 
model, the second largest eigenvalue A = 1 — jv(A^+ict|) • know that the 
stationary distribution of the Polya level model is the Dirichlet-multinomial 
distribution with parameters N and a. The eigenfunction /(x) = Yli=i ~ 
N{1 — y^) corresponding to A is strictly monotone in ^. Let be a d- 
dimensional vector such that the first d — 1 entires are zero, and the dth 
entry is A''. It is easily seen that is dominated by x for every x S X^. 
Hence, the conditions of Theorem 2.1 are satisfied, with ci = 1 and C2 = 
max{iV-^,Af(l - ■^)}. Let pi = ■^^,i = 1,2, ... ,d. Thus, the bounds on the 
total variation distance are obtained as 

d-l 
i=l 



(3.12) ||K^-7r||TV<A"^a;i + iV(l-pd) 

Similarly, in the case of Polya down-up models, the second largest eigenvalue 
is given byA = (l — ■^)(1 — ttTH-'"^ Polya up-down 

models, the second largest eigenvalue is given byA = (l + ;^)~^(H- 77:^1^) • 
These can be substituted in (3.11) and (3.12) to get the corresponding total 
variation bounds for these models. 



(3.11) ||K"-7r||TV> -r -, 



Remark. Note that the coefficient of A" in the upper bound derived in 
(3.12) is at most 2N. Let us try and compare it to the coefficient of A" in 
the crude upper bound, which is given by 

1 _ 1 

At one possible extreme, when all entries of x except the ith one are zero, 
the coefficient is essentially a polynomial in N of degree . At the other 

possible extreme, when all the entries of x are equal to ^ (assuming ^ is 
an integer), the coefficient is essentially a polynomial in N of degree 
The main fact is that the coefficient of A" in the upper bound derived in 
(3.12) is linear in A^, whereas the coefficient of A" in the crude upper bound 
almost always behaves like a polynomial of a higher degree in A^. 



nti C'^:r') ' 
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Example 3.2.1. Consider the Polya level model where = 100 balls 
are distributed in d = 5 urns. Suppose s = 2 balls are chosen and each urn 
has inherent weight = 180 for every 1 < i < 5. Using (3.11) and (3.12), 
for a starting state x= (0,20,0,20,60), the bounds on the total variation 
distance are obtained as 



For e = 0.01, (3.13) tells us that 178 steps are necessary and 518 steps are 
sufficient for the total variation distance to be less than 0.01. The crude 
upper bound for total variation distance is (6.1094 x 10^^) (1 — ^q)"' which 
would have implied 2002 steps are sufficient for the total variation distance 
to be less then 0.01. 

3.3. A generalized Ehrenfest urn model. There are indistinguishable 
balls to be distributed to d urns. At each step, s balls are chosen at random 
from the total of A^ balls, and each of them is redistributed independently 
according to the same probability p = {pi,p2, ■ ■ ■ ,Pd)- Let Xni be the number 
of balls in the ith. urn at the reth step of the Markov chain. Then {X„ = 
{Xni,Xn2, ■ ■ ■ , Xndj^n = 0,1,2,.. .} forms a multivariate Markov chain on 
Xfj. Let K denote the transition density of this Markov chain. 

Consider the same partial ordering, ^ , as defined in the case of the Moran 
process. We now show that is a monotone Markov chain with respect to 
the partial ordering, <. 

Theorem 3.3. K is monotone with respect to the partial ordering, ^. 

Proof. Consider any x G and y G with x ^ y. We construct 
two random vectors X and Y such that X ^ Y with X ~ K{x, •) and Y ~ 
K{y, •). This will immediately imply that Kf{x.) < Kf{y) for any monotone 
function / and any x,y with x ^ y. 

In order to specify the coupling argument, we consider two populations 
of A^ balls each, with A^ balls distributed in d urns based on x and y, 
respectively. We use the same labeling technique for both the populations 
as discussed in Theorem 3.1 (regarding species as urns and individuals as 
balls). 

We now change the urn configuration of population 1 and population 2 
in five sub-steps which are described below: 

(I) Choose s labels without replacement from 1 to A^. 
(II) Remove the balls with the chosen labels from both x and y. 
(Ill) Choose an urn, such that urn i is chosen with probability pi for 
every i = 1,2, . . . ,d. 



(3.13) 
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(IV) Add a ball to the chosen urn for both the current X and Y config- 
urations. 

(V) Repeat steps (III) and (IV) s times independently. 

Let ki := Yli=i the total number of balls in the first d — 1 urns 

of population 1, and ^2 := yd be the number of balls in the dth urn of 
population 2. Consider sub-steps (I) and (II). Without loss of generality, let 
us assume out of s labels chosen, r labels are between 1 and ki + ^2 and 
s — r labels are between ki + k2 + l and N. 

• Each of the r balls corresponding to labels 1 to fci + k2 lies in exactly the 
same urn for both the populations. Removing these does not change the 
partial ordering between the urn configurations. 

• Each of the s — r balls corresponding to labels fci + A;2 + 1 to lie in urn 
d for population 1 , and are "extra balls" lying in the first d — 1 urns for 
population 2. Hence, removing them does not change the partial ordering 
between the urn configurations of population 1 and population 2. 

Consider sub-steps (III), (IV) and (V). Since the balls are put in the 
same urn for both the populations, adding the new balls does not change 
the partial ordering between the urn configurations. 

Let X and Y be the resulting urn configurations of population 1 and 
population 2, respectively. Note that marginally the movement from both x 
to X and y to Y follows the transition mechanism of K, and X ^ Y. This 
completes the proof. □ 

The following example illustrates the one-step movement of the above 
construction in population 1 and population 2 for Theorem 3.3. 

Example 3.3.1. Consider the same x and y as in Table 1. Suppose the 
4 balls chosen in sub-step (I) are with labels 6, 8, 14 and 16. It is evident 
that the removal of the balls with the chosen labels in sub-step (II) does 
not alter the partial ordering between the urn configurations of the two 
populations. Since the urn chosen in sub-step (III) is same for both the 
populations, adding a ball to the urn in sub-step (IV) does not change the 
partial ordering between the urn configurations of the two populations. 

3.3.1. Bounds on total variation distance. For the partial ordering, ^, 
discussed above, applying Theorem 2.1 in the case of the generalized Ehren- 
fest urn model, provides us with bounds on the total variation distance. 

It has been proved earlier in Khare and Zhou [9] that the generalized 
Ehrenfest urn model has second largest eigenvalue A = 1 — with the 
eigenspace given by the space of linear functions of xi,X2, ■ ■ ■ ,Xd-i- It is 
known that the stationary distribution is the multinomial distribution with 
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parameters N and p. The eigenfunction /(x) = PdYli=i ~ ~ Pd)xd = 
"^iZi ~ -^(1 ~Pd) corresponding to the eigenvalue A is strictly monotone 
in :<. Again, it is easily seen that is dominated by x, for every x G X^. 
Hence, the conditions of Theorem 2.1 are satisfied. We have ci = 1 and 
C2 = iiiax{Np(i, N{1 —pd)}- Thus, the bounds on total variation distance are 

d-l 



(3.14) HK-.\hv> <'-"'^'" 



2max{Npd,N{l-pd)} 



=1 



.d-l 
S \ / v-^ 



(3.15) ||K^-vr||TV<(l-^l nx, + Nil-pd) 



=1 



Remark. Note that the coefficient of (1 — j^)^ in the upper bound 
derived in (3.15) is at most 2N. We compare it to the coefficient of (1 — -^)" 
in the crude upper bound, which is given by 

1 




2VV(x) 

At one possible extreme, when all entries of x except the ith one are zero, the 
coefficient is ^(— i=)^. At the other possible extreme, when all the entries of 

^ y/Pi 

X are equal to ^ (assuming ^ is an integer), using Stirling's approximation 
for large N,'^ the coefficient is 

(27riV)('^-i)/4 / 1 ^ ^ 



2d<^/4 



d{UtlP^)'/'' 



Since Ej=iPi = 1; i* follows by the AM-GM inequality that d{l\^^-^pi)^/'^ < 
1, unless all the entries of p are equal. Hence, if all the entries of x are the 
same and all entries of pi are not the same, the coefficient of (1 — -^)" in the 
crude upper bound is exponential in A^. If all the pi are same, the coefficient 
is of the order A^('^~^)/4. 

The main fact is that the coefficient of (1 — ■^)"' in the upper bound 
derived in (3.15) is linear in A^, whereas the coefficient of (1 — ■^)" in the 
crude upper bound is almost always exponential in A^. 

Example 3.3.2. Consider the generalized Ehrenfest urn model where 
A^ = 100 balls are distributed in d = 5 urns. Suppose s = 1 ball is chosen 
and each urn is chosen with probability pi = 1/5, f = 1, 2, . . . , 5. Using (3.14) 



^Note that is the notation for the total number of balls in the urns, not the number 
of steps. 
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and (3.15), for a starting state x = (0,20,0,20,60), the bounds on the total 
variation distance are obtained as 

(3,16) 0.25(l-:±;)"<||A-;-.||xv<120(l-4 

For e = 0.01, (3.16) tells us that 321 steps are necessary and 935 steps are 
sufficient for the total variation distance to be less then 0.01. The crude up- 
per bound for total variation distance ^— ^i=(l - 1/100)" = (1.02 x 10^^)(1- 

1/100)" would have implied that 3897 steps are sufficient for the total vari- 
ation distance to be less then 0.01. 

4. Discussion. We use a probabilistic technique based on a monotone 
coupling argument for analyzing all the examples in this paper. We obtain 
reasonable upper and lower bounds for the total variation distance for any 
arbitrary starting point of the Markov chain, significantly broadening pre- 
vious results in [9]. This analysis is very simple to implement, requiring the 
knowledge of a single eigenfunction and its corresponding eigenvalue. In ad- 
dition, the analysis does not require the assumption of reversibility. As an 
illustration, we provide the nonreversible Moran model in Section 3.1. The 
next goal is to sharpen the bounds to obtain matching upper and lower 
bounds, and to generalize the techniques developed in this paper for contin- 
uous state spaces. 

APPENDIX 

Lemma 1. // the mutation matrix M is irreducible, then the transition 
density K in (3.1) is irreducible and aperiodic. 

Proof. We first show irreducibility. Let x S Xf^ be arbitrarily cho- 
sen. Let i ^ j be such that 1 < i,j < d and Xi > 0. By the irreducibility 
of M, there exists n G N such that (M")jj > 0. As a result, there exist 
i = ko,ki,k2, . . . , kn-i,kn = j such that nr=o^ ^hh+i > 0. Let x'^ = x, and 
x^ = x'~^ -|- — Gki_i for 1 < / < "H,. Note that by construction, > 0, 
which implies x' G Xfj for every 1<1 <n. Hence, 

if"(x,x + e,-e,) = K"(xO,x") 

n-l 

>n^(x',x'+^) 



>0. 
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We have thus shown that if x and y are neighbors in X^, that is, if y can 
be obtained from x by removing an individual in one species and adding 
an individual in another, then there exists n G N such that K^{x,y) > 0. 
Since any two elements of are connected by a path such that successive 
elements in the path are neighbors, it follows that K is irreducible. 

We now show aperiodicity. Since M is irreducible, there exist i,j such 
that 1 <i ^ j < d and rriij > 0. If x G Xfj is such that Xj, Xj > 0, then 

A'(x,x)>^^m,,>0. 

Since K is irreducible, and there exists at least one x G such that 
i^(x,x) > 0, it follows that K is aperiodic. □ 
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